AITopics | markov decision process

Fast Bellman Updates for Wasserstein Distributionally Robust MDPs

Neural Information Processing SystemsFeb-19-2026, 17:09:55 GMT

Markov decision processes (MDPs) often suffer from the sensitivity issue under model ambiguity. In recent years, robust MDPs have emerged as an effective framework to overcome this challenge. Distributionally robust MDPs extend the robust MDP framework by incorporating distributional information of the uncertain model parameters to alleviate the conservative nature of robust MDPs.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

d0949cbcec31c09431610553a284f94a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 06:03:36 GMT

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Virginia (0.04)
North America > United States > Pennsylvania (0.04)
(4 more...)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(3 more...)

Add feedback

Policy Gradient for Rectangular Robust Markov Decision Processes

Neural Information Processing SystemsFeb-16-2026, 19:26:39 GMT

However, they do not account for transition uncertainty, whereas learning robust policies can be computationally expensive.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.06)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

Add feedback

b0a34e3c64f7e842f20ec10479c32b35-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 14:17:21 GMT

dynamic programming, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor

Neural Information Processing SystemsFeb-16-2026, 07:50:34 GMT

We introduce the Blackwell discount factor for Markov Decision Processes (MDPs). Classical objectives for MDPs include discounted, average, and Blackwell opti-mality.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > New Hampshire (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor

Neural Information Processing SystemsFeb-16-2026, 07:50:30 GMT

We introduce the Blackwell discount factor for Markov Decision Processes (MDPs). Classical objectives for MDPs include discounted, average, and Blackwell opti-mality.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > New Hampshire (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

a264726ebd222124514a32bf0143b83d-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 06:18:15 GMT

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
Asia > Middle East > Republic of Türkiye (0.04)
Asia > China (0.04)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

Neural Information Processing SystemsFeb-16-2026, 06:18:11 GMT

Risk-averse reinforcement learning (RL) seeks to provide a risk-averse policy for high-stakes real-world decision problems. These high-stake domains include autonomous driving (Jin et al., 2019; Sharma et al., 2020), robot collision avoidance (Ahmadi et al., 2021; Hakobyan and Y ang, 2021),

artificial intelligence, machine learning, optimization problem, (19 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Basel-City > Basel (0.04)
North America > United States > New Hampshire (0.04)
North America > Canada > Quebec > Montreal (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry:

Information Technology (0.87)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)

Add feedback

Weakly Coupled Deep Q-Networks

Neural Information Processing SystemsFeb-15-2026, 17:43:42 GMT

"subagents," one for each subproblem, and then combine their solutions to establish

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Oceania > New Zealand (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Industry:

Transportation > Ground > Road (0.94)
Transportation > Electric Vehicle (0.94)
Automobiles & Trucks (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning

Jiao, Yuchen, Woo, Jiin, Li, Gen, Joshi, Gauri, Chi, Yuejie

arXiv.org Machine LearningJan-21-2026

Average-reward reinforcement learning offers a principled framework for long-term decision-making by maximizing the mean reward per time step. Although Q-learning is a widely used model-free algorithm with established sample complexity in discounted and finite-horizon Markov decision processes (MDPs), its theoretical guarantees for average-reward settings remain limited. This work studies a simple but effective Q-learning algorithm for average-reward MDPs with finite state and action spaces under the weakly communicating assumption, covering both single-agent and federated scenarios. For the single-agent case, we show that Q-learning with carefully chosen parameters achieves sample complexity $\widetilde{O}\left(\frac{|\mathcal{S}||\mathcal{A}|\|h^{\star}\|_{\mathsf{sp}}^3}{\varepsilon^3}\right)$, where $\|h^{\star}\|_{\mathsf{sp}}$ is the span norm of the bias function, improving previous results by at least a factor of $\frac{\|h^{\star}\|_{\mathsf{sp}}^2}{\varepsilon^2}$. In the federated setting with $M$ agents, we prove that collaboration reduces the per-agent sample complexity to $\widetilde{O}\left(\frac{|\mathcal{S}||\mathcal{A}|\|h^{\star}\|_{\mathsf{sp}}^3}{M\varepsilon^3}\right)$, with only $\widetilde{O}\left(\frac{\|h^{\star}\|_{\mathsf{sp}}}{\varepsilon}\right)$ communication rounds required. These results establish the first federated Q-learning algorithm for average-reward MDPs, with provable efficiency in both sample and communication complexity.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2601.13642

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

Collaborating Authors

markov decision process

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Fast Bellman Updates for Wasserstein Distributionally Robust MDPs

d0949cbcec31c09431610553a284f94a-Paper-Conference.pdf

Policy Gradient for Rectangular Robust Markov Decision Processes

b0a34e3c64f7e842f20ec10479c32b35-Paper-Conference.pdf

Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor

Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor

a264726ebd222124514a32bf0143b83d-Supplemental-Conference.pdf

On Dynamic Programming Decompositions of Static Risk Measures in Markov Decision Processes

Weakly Coupled Deep Q-Networks

Sample Complexity of Average-Reward Q-Learning: From Single-agent to Federated Reinforcement Learning